Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use ubuntu-latest for tsan CI #2267

Merged
merged 10 commits into from
Aug 21, 2023
Merged

use ubuntu-latest for tsan CI #2267

merged 10 commits into from
Aug 21, 2023

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Aug 16, 2023

Fixes #2266

Changes

Upgrading to latest virtual env for tsan CI.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@lalitb lalitb requested a review from a team August 16, 2023 21:29
@codecov
Copy link

codecov bot commented Aug 16, 2023

Codecov Report

Merging #2267 (51497d9) into main (c0092c4) will not change coverage.
The diff coverage is n/a.

❗ Current head 51497d9 differs from pull request most recent head fe44872. Consider uploading reports for the commit fe44872 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2267   +/-   ##
=======================================
  Coverage   87.52%   87.52%           
=======================================
  Files         199      199           
  Lines        5981     5981           
=======================================
  Hits         5234     5234           
  Misses        747      747           

@lalitb lalitb enabled auto-merge (squash) August 16, 2023 21:48
@lalitb lalitb disabled auto-merge August 16, 2023 21:56
@owent
Copy link
Member

owent commented Aug 17, 2023

The report of tsan is not a problem of this PR.But I think we should use another mutex lock in BasicCurlHttpTests::onHttpRequest and inside the predicate function of wait_for in BasicCurlHttpTests::waitForRequests.Using mtx_requests here may lead to dead lock.

@ThomsonTan
Copy link
Contributor

The report of tsan is not a problem of this PR.But I think we should use another mutex lock in BasicCurlHttpTests::onHttpRequest and inside the predicate function of wait_for in BasicCurlHttpTests::waitForRequests.Using mtx_requests here may lead to dead lock.

Does this sound like a regression?

@lalitb
Copy link
Member Author

lalitb commented Aug 18, 2023

Using mtx_requests here may lead to dead lock.

@owent - Do you mean separate mutex for onHttpRequest and waitForRequests? The mutex mtx_requests is being used with the conditional variable cv_got_events to synchronize http server and client requests. I thought this shouldn't be a potential race condition, unless there is some bug with thread sanitizer. Not sure if this is relevant, but I found one similar bug here - google/sanitizers#1259, reported with gcc-11.x.

Ok please ignore. I think there is an issue. Let me debug further.

@@ -164,7 +164,7 @@ class BasicCurlHttpTests : public ::testing::Test, public HTTP_SERVER_NS::HttpRe

bool waitForRequests(unsigned timeOutSec, unsigned expected_count = 1)
{
std::unique_lock<std::mutex> lk(mtx_requests);
std::unique_lock<std::mutex> lk(cv_mtx_requests);
if (cv_got_events.wait_for(lk, std::chrono::milliseconds(1000 * timeOutSec), [&] {
//
Copy link
Member

@owent owent Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need std::unique_lock<std::mutex> lk1(mtx_requests); here?(In callback of cv_got_events.wait_for)

@owent
Copy link
Member

owent commented Aug 18, 2023

regression

The report of tsan is not a problem of this PR.But I think we should use another mutex lock in BasicCurlHttpTests::onHttpRequest and inside the predicate function of wait_for in BasicCurlHttpTests::waitForRequests.Using mtx_requests here may lead to dead lock.

Does this sound like a regression?

I think it's a dead lock problem and happens under this sequence.

Thread 1 Thread 2 Comment
calling waitForRequests()
run std::unique_lock<std::mutex> lk(mtx_requests); Lock mtx_requests
calling onHttpRequest()
run std::unique_lock<std::mutex> lk(mtx_requests); Block wait mtx_requests
run cv_got_events.wait_for(...) and predicate function returns false
Try to lock mtx_requests Block wait mtx_requests

The STL implementation of std::condition_variable::wait_for is locking the lock again so it can block the running thread, and std::condition_variable::notify_one or std::condition_variable::notify_all will unlock it and let the blocked thread continue. But in the execution sequence, there will be a deadlock.Because Thread 2 is also blocked and will not call cv_got_events.notify_one();.

@lalitb
Copy link
Member Author

lalitb commented Aug 18, 2023

unfortunately, the problem is now happening in ubuntu 22.04 runner too:

ERROR: /home/runner/work/opentelemetry-cpp/opentelemetry-cpp/api/test/nostd/BUILD:4:8: Linking api/test/nostd/function_ref_test failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc @bazel-out/k8-tsan-fastbuild/bin/api/test/nostd/function_ref_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld.gold: error: cannot open libtsan_preinit.o: No such file or directory

@ThomsonTan
Copy link
Contributor

unfortunately, the problem is now happening in ubuntu 22.04 runner too:

ERROR: /home/runner/work/opentelemetry-cpp/opentelemetry-cpp/api/test/nostd/BUILD:4:8: Linking api/test/nostd/function_ref_test failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc @bazel-out/k8-tsan-fastbuild/bin/api/test/nostd/function_ref_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld.gold: error: cannot open libtsan_preinit.o: No such file or directory

Could we bring the tsan issue to backlog and unblock merging of other PRs?

@lalitb
Copy link
Member Author

lalitb commented Aug 18, 2023

Could we bring the tsan issue to backlog and unblock merging of other PRs?

We have it in backlog (#2266). We can force merge any PR which are failing for tsan, while we are trying to fix this.

@lalitb
Copy link
Member Author

lalitb commented Aug 21, 2023

zpages tests failing for tsan. Have removed them from the tsan build, as zpages would be deprecated soon (#2217)

@lalitb lalitb enabled auto-merge (squash) August 21, 2023 20:27
@lalitb lalitb merged commit d3799a6 into open-telemetry:main Aug 21, 2023
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Github CI failing for tsan test
3 participants